Indexing and Querying XML Data for Regular Path Expressions
نویسندگان
چکیده
With the advent of XML as a standard for data representation and exchange on the Internet, storing and querying XML data becomes more and more important. Several XML query languages have been proposed, and the common feature of the languages is the use of regular path expressions to query XML data. This poses a new challenge concerning indexing and searching XML data, because conventional approaches based on tree traversals may not meet the processing requirements under heavy access requests. In this paper, we propose a new system for indexing and storing XML data based on a numbering scheme for elements. This numbering scheme quickly determines the ancestor-descendant relationship between elements in the hierarchy of XML data. We also propose several algorithms for processing regular path expressions, namely, (1) EE-Join for searching paths from an element to another, (2) EA-Join for scanning sorted elements and attributes to find element-attribute pairs, and (3) KC-Join for finding Kleene-Closure on repeated paths or elements. The EE-Join algorithm is highly effective particularly for searching paths that are very long or whose lengths are unknown. Experimental results from our prototype system implementation show that the proposed algorithms can process XML queries with regular path expressions by up to an orThis work was sponsored in part by National Science Foundation CAREER Award (IIS-9876037) and Research Infrastructure program EIA-0080123. The authors assume all responsibility for the contents
منابع مشابه
iXUPT: Indexing XML Using Path Templates
The XML format has become the standard for data exchange because it is self-describing and it stores not only information but also the relationships between data. Therefore it is used in very different areas. To find the right information in an XML file, we need to have a fast and an effective access to data. Similar to relational databases, we can create an index in order to speed up the query...
متن کاملRegular Path Expression for Querying Semistructured Data - Implementation in Prolog
We present regular path expressions (RPE) a language for querying data graphs and its context free grammar implementation in Prolog. A proof of concept parser and query tool is implemented and various usage examples are analyzed for semistructured data formats like XML and JSON.
متن کاملIndexing and Querying Semistructured Data Views of Relational Database
The most promising and dominant data format for data processing and representing on the Internet is the Semistructured data form termed XML. XML data has no fixed schema; it evolved and is self describing which results in management difficulties compared to, for example relational data. XML queries differ from relational queries in that the former are expressed as path expressions. The efficien...
متن کاملIndexing XML Data with UB-trees
Using the terminology usual in databases, it is possible to view XML as a language for data modelling. To retrieve XML data from XML databases, several query languages have been proposed. The common feature of these languages is the use of regular path expressions. Users are allowed to navigate through arbitrary long paths in the data by regular path expressions. Several index structures for XM...
متن کاملIndexing XML to Support Path Expressions
The extensible markup language (XML) is rapidly becoming a dominating technology in the area of data intensive applications. Although several implementations are already offered in commercial products, especially DBMSs, there are still open research issues related to efficiency of XML storage and retrieval. This paper introduces and analyses new index structures suitable for support of regular ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001